Customer Churn Prediction Using Machine Learning¶

Project 1

Customer churn, or attrition, is the loss of clients or customers from a company's client base. Predicting customer churn helps businesses identify at-risk customers and take preventive measures to retain them, ultimately improving customer satisfaction and reducing revenue loss. In this article, we will walk through a Jupyter Notebook that predicts customer churn using machine learning techniques.¶

I am using a sample dataset of a telecommunications company, which contains customer information, including demographic data, usage, and billing information. The target variable is Churn, which indicates if a customer has left the company.¶

Data Preprocessing¶

In this step, we load the dataset, clean the data, and preprocess it for machine learning. We remove unnecessary columns, handle missing values, and convert categorical features using one hot encoder.¶

In [10]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.cluster import KMeans
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, recall_score, precision_score, f1_score

import xgboost as xgb
from imblearn.over_sampling import SMOTE
from sklearn.model_selection import GridSearchCV
from imblearn.pipeline import Pipeline
from sklearn.feature_selection import SelectKBest

from scipy import stats
from statsmodels.stats.outliers_influence import variance_inflation_factor
from kneed import KneeLocator
from joblib import Parallel, delayed

import plotly.express as px
import plotly.graph_objs as go
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook"


import warnings
warnings.filterwarnings("ignore")
In [2]:
# Preprocess data based on the provided data structure
def preprocess_data(df):
    # Handle missing values
    print(list(df.columns))
    df.replace('Unknown',0,inplace=True)
    df['HandsetPrice'] = df['HandsetPrice'].astype('float')
    df.fillna(df.mean(), inplace=True)
    df.fillna(df.mode().iloc[0], inplace=True)
    
    # Drop unnecessary columns (if any)
    # df = df.drop(['ColumnToDrop'], axis=1)
    
    return df

def remove_constant_features(df, threshold=0):
    # Identify constant features
    constant_features = [col for col in df.columns if df[col].nunique() <= threshold]
    print(f"Removing constant features: {constant_features}")
    
    # Drop constant features
    df = df.drop(columns=constant_features)
    
    return df

def load_and_preprocess_data(file_path):
    # Load the data
    df = pd.read_csv(file_path)
    
    # Perform preprocessing and feature extraction
    df = preprocess_data(df)

    # Remove constant features
    df = remove_constant_features(df)
    
    # Label encode the binary categorical columns
    binary_categorical_columns = ['ChildrenInHH', 'HandsetRefurbished', 'HandsetWebCapable', 'TruckOwner', 'RVOwner',
                                  'Homeownership', 'BuysViaMailOrder', 'RespondsToMailOffers', 'OptOutMailings',
                                  'NonUSTravel', 'OwnsComputer', 'HasCreditCard', 'NewCellphoneUser',
                                  'NotNewCellphoneUser', 'OwnsMotorcycle', 'MadeCallToRetentionTeam']
    for col in binary_categorical_columns:
        le = LabelEncoder()
        df[col] = le.fit_transform(df[col].astype(str))

    # Evaluate correlation and multicollinearity before one-hot encoding
    df = evaluate_correlation_and_multicollinearity(df)

    # One-hot encode the remaining categorical columns
    categorical_columns = ['ServiceArea', 'CreditRating', 'PrizmCode', 'Occupation', 'MaritalStatus']
    df = pd.get_dummies(df, columns=categorical_columns, drop_first=True)  # Add drop_first=True

    return df

Correlation and Multicollinearity¶

  • Correlation measures the strength and direction of the linear relationship between two variables. It ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no correlation.
  • Multicollinearity refers to the presence of high correlations between multiple predictor variables in a regression model. It can lead to unstable estimates of regression coefficients and make it challenging to interpret the importance of each predictor.

The function performs the following steps:

  1. Calculate the correlation matrix for numerical variables: This step computes Pearson's correlation coefficient between all pairs of numerical features in the dataset. Pearson's correlation coefficient ranges from -1 to 1, with -1 indicating a strong negative correlation, 1 indicating a strong positive correlation, and 0 indicating no correlation.
  2. Use the Chi-squared test of independence for categorical variables: The Chi-squared test of independence is used to analyze the relationship between pairs of categorical features. A low p-value from the test indicates a strong association between the two variables.
  3. Calculate the Variance Inflation Factor (VIF) to detect multicollinearity among numerical variables: The VIF is a measure of how much the variance of an estimated regression coefficient is increased due to multicollinearity. A VIF value greater than the specified threshold indicates a high degree of multicollinearity.
  4. Use Cramér's V for categorical variables: Cramér's V is a measure of the strength of association between two categorical variables, with values ranging from 0 (no association) to 1 (perfect association).

After these four steps, the function identifies features that are highly correlated or have high multicollinearity by comparing their correlation coefficients, p-values, VIF values, and Cramér's V values against the specified thresholds. Finally, the function drops the identified correlated features from the dataset and returns the modified DataFrame. By using this function, we can effectively remove features that may introduce multicollinearity or high correlation in our dataset, which can help improve the performance and interpretability of machine learning models.

In [3]:
def evaluate_correlation_and_multicollinearity(df, correlation_threshold=0.8, vif_threshold=5, cramers_v_threshold=0.7):
    # Step 1: Calculate the correlation matrix for numerical variables
    numerical_cols = df.select_dtypes(include=['int64', 'float64']).columns
    #numerical_cols = numerical_cols.drop('Churn')  # Exclude 'Churn' from numerical_cols
    correlation_matrix = df[numerical_cols].corr()

    # Step 2: Use the Chi-squared test of independence for categorical variables
    categorical_cols = df.select_dtypes(include=['object']).columns
    categorical_cols = categorical_cols.drop('Churn')
    chi2_matrix = pd.DataFrame(index=categorical_cols, columns=categorical_cols)
    for col1 in categorical_cols:
        for col2 in categorical_cols:
            contingency_table = pd.crosstab(df[col1], df[col2])
            _, p_value, _, _ = stats.chi2_contingency(contingency_table)
            chi2_matrix.loc[col1, col2] = p_value

    # Step 3: Calculate the VIF to detect multicollinearity among numerical variables
    numerical_df = df[numerical_cols]#.drop(columns='Churn')
    vif = pd.DataFrame()
    vif['Feature'] = numerical_df.columns
    vif['VIF'] = [variance_inflation_factor(numerical_df.values, i) for i in range(numerical_df.shape[1])]

    # Step 4: Use Cramér's V for categorical variables
    def cramers_v(confusion_matrix):
        chi2 = stats.chi2_contingency(confusion_matrix)[0]
        n = confusion_matrix.sum().sum()
        phi2 = chi2 / n
        r, k = confusion_matrix.shape
        phi2corr = max(0, phi2 - ((k - 1) * (r - 1)) / (n - 1))
        rcorr = r - ((r - 1) ** 2) / (n - 1)
        kcorr = k - ((k - 1) ** 2) / (n - 1)
        return np.sqrt(phi2corr / min((kcorr - 1), (rcorr - 1)))

    cramers_v_matrix = pd.DataFrame(index=categorical_cols, columns=categorical_cols)
    for col1 in categorical_cols:
        for col2 in categorical_cols:
            confusion_matrix = pd.crosstab(df[col1], df[col2])
            cramers_v_value = cramers_v(confusion_matrix)
            cramers_v_matrix.loc[col1, col2] = cramers_v_value

    # Remove highly correlated numerical features
    correlated_features = set()
    for i in range(len(correlation_matrix.columns)):
        for j in range(i):
            if abs(correlation_matrix.iloc[i, j]) > correlation_threshold:
                colname = correlation_matrix.columns[i]
                correlated_features.add(colname)

    # Remove numerical features with high VIF
    high_vif_features = vif[vif['VIF'] > vif_threshold]['Feature']
    correlated_features.update
    high_vif_features.values.tolist()

    # Remove categorical features with low p-values (high association) and high Cramér's V values
    for col1 in categorical_cols:
        for col2 in categorical_cols:
            if (col1 != col2) and (chi2_matrix.loc[col1, col2] < (1 - correlation_threshold)) and (cramers_v_matrix.loc[col1, col2] > cramers_v_threshold):
                correlated_features.add(col1)

    # Drop correlated features
    df = df.drop(columns=list(correlated_features))

    return df
In [4]:
# Replace with the path to your customer data
file_path = "./data/cell2celltrain.csv"
df = load_and_preprocess_data(file_path)
['CustomerID', 'Churn', 'MonthlyRevenue', 'MonthlyMinutes', 'TotalRecurringCharge', 'DirectorAssistedCalls', 'OverageMinutes', 'RoamingCalls', 'PercChangeMinutes', 'PercChangeRevenues', 'DroppedCalls', 'BlockedCalls', 'UnansweredCalls', 'CustomerCareCalls', 'ThreewayCalls', 'ReceivedCalls', 'OutboundCalls', 'InboundCalls', 'PeakCallsInOut', 'OffPeakCallsInOut', 'DroppedBlockedCalls', 'CallForwardingCalls', 'CallWaitingCalls', 'MonthsInService', 'UniqueSubs', 'ActiveSubs', 'ServiceArea', 'Handsets', 'HandsetModels', 'CurrentEquipmentDays', 'AgeHH1', 'AgeHH2', 'ChildrenInHH', 'HandsetRefurbished', 'HandsetWebCapable', 'TruckOwner', 'RVOwner', 'Homeownership', 'BuysViaMailOrder', 'RespondsToMailOffers', 'OptOutMailings', 'NonUSTravel', 'OwnsComputer', 'HasCreditCard', 'RetentionCalls', 'RetentionOffersAccepted', 'NewCellphoneUser', 'NotNewCellphoneUser', 'ReferralsMadeBySubscriber', 'IncomeGroup', 'OwnsMotorcycle', 'AdjustmentsToCreditRating', 'HandsetPrice', 'MadeCallToRetentionTeam', 'CreditRating', 'PrizmCode', 'Occupation', 'MaritalStatus']
Removing constant features: []
In [12]:
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(df.head())
    display(df.info( show_counts=True))
CustomerID Churn MonthlyRevenue MonthlyMinutes TotalRecurringCharge DirectorAssistedCalls OverageMinutes RoamingCalls PercChangeMinutes PercChangeRevenues DroppedCalls BlockedCalls UnansweredCalls CustomerCareCalls ThreewayCalls OutboundCalls InboundCalls PeakCallsInOut OffPeakCallsInOut CallForwardingCalls CallWaitingCalls MonthsInService UniqueSubs ActiveSubs Handsets CurrentEquipmentDays AgeHH1 AgeHH2 ChildrenInHH HandsetRefurbished HandsetWebCapable TruckOwner RVOwner Homeownership BuysViaMailOrder RespondsToMailOffers OptOutMailings NonUSTravel OwnsComputer HasCreditCard RetentionCalls RetentionOffersAccepted NewCellphoneUser NotNewCellphoneUser ReferralsMadeBySubscriber IncomeGroup OwnsMotorcycle AdjustmentsToCreditRating HandsetPrice MadeCallToRetentionTeam ServiceArea_AIRAND864 ServiceArea_AIRASH828 ServiceArea_AIRAUG706 ServiceArea_AIRBEA843 ServiceArea_AIRCAM803 ServiceArea_AIRCHA843 ServiceArea_AIRCOL803 ServiceArea_AIRELI252 ServiceArea_AIRFLO843 ServiceArea_AIRGAF864 ServiceArea_AIRGEO843 ServiceArea_AIRGOL919 ServiceArea_AIRGRE864 ServiceArea_AIRGRN252 ServiceArea_AIRGWD864 ServiceArea_AIRHHI843 ServiceArea_AIRHIC828 ServiceArea_AIRJAC910 ServiceArea_AIRKIN252 ServiceArea_AIRMAR828 ServiceArea_AIRMOR828 ServiceArea_AIRMYR843 ServiceArea_AIRNEW803 ServiceArea_AIRNWB252 ServiceArea_AIRORA803 ServiceArea_AIRROA252 ServiceArea_AIRROC252 ServiceArea_AIRSAV912 ServiceArea_AIRSPA864 ServiceArea_AIRSUM803 ServiceArea_AIRWIL910 ServiceArea_AIRWIN252 ServiceArea_AIRWYV828 ServiceArea_APCANN443 ServiceArea_APCBAL410 ServiceArea_APCBEL443 ServiceArea_APCBET240 ServiceArea_APCEAS443 ServiceArea_APCFCH703 ServiceArea_APCFRD301 ServiceArea_APCFRE540 ServiceArea_APCLEE703 ServiceArea_APCLXT240 ServiceArea_APCSAL443 ServiceArea_APCSIL301 ServiceArea_APCSVP443 ServiceArea_APCWAL240 ServiceArea_APCWAR540 ServiceArea_APCWAS202 ServiceArea_APCWES443 ServiceArea_ATHHAM423 ServiceArea_ATHJHC423 ServiceArea_ATHKIN423 ServiceArea_ATHLIM423 ServiceArea_ATLALB912 ServiceArea_ATLANE678 ServiceArea_ATLATH706 ServiceArea_ATLATL678 ServiceArea_ATLATN423 ServiceArea_ATLBRU912 ServiceArea_ATLCHA423 ServiceArea_ATLCHN706 ServiceArea_ATLCOL706 ServiceArea_ATLDAL334 ServiceArea_ATLDBL478 ServiceArea_ATLDOT334 ServiceArea_ATLDTN706 ServiceArea_ATLJCK901 ServiceArea_ATLKNO423 ServiceArea_ATLLAG706 ServiceArea_ATLMAC912 ServiceArea_ATLMDV478 ServiceArea_ATLMEM901 ServiceArea_ATLNOR678 ServiceArea_ATLOPE334 ServiceArea_ATLOVB601 ServiceArea_ATLPRR478 ServiceArea_ATLROS678 ServiceArea_ATLSEN662 ServiceArea_ATLSWT423 ServiceArea_ATLTUN601 ServiceArea_ATLVAL229 ServiceArea_ATLWMP870 ServiceArea_AWIAPP920 ServiceArea_AWIFON920 ServiceArea_AWIGRE920 ServiceArea_AWIMAN920 ServiceArea_AWIOSH920 ServiceArea_AWISHE920 ServiceArea_BIRBIR205 ServiceArea_BIRJAS205 ServiceArea_BIRPEL205 ServiceArea_BOSBOS508 ServiceArea_BOSBOS617 ServiceArea_BOSBOS781 ServiceArea_BOSBOS978 ServiceArea_BOSBRA781 ServiceArea_BOSFRA508 ServiceArea_BOSHYA508 ServiceArea_BOSMAN603 ServiceArea_BOSNSH603 ServiceArea_BOSPRO401 ServiceArea_BOSPTL207 ServiceArea_BOSWOR508 ServiceArea_CHIBLO309 ServiceArea_CHICHA217 ServiceArea_CHICHI312 ServiceArea_CHICHI773 ServiceArea_CHICPT219 ServiceArea_CHIDAV319 ServiceArea_CHIDEC217 ServiceArea_CHIDEK815 ServiceArea_CHIGRY219 ServiceArea_CHIJOL815 ServiceArea_CHIKAN815 ServiceArea_CHILAG630 ServiceArea_CHILAG708 ServiceArea_CHILAS815 ServiceArea_CHIMTV618 ServiceArea_CHINBK847 ServiceArea_CHIPEO309 ServiceArea_CHIRCK815 ServiceArea_CHIROC309 ServiceArea_CHISPR217 ServiceArea_CHIWAT319 ServiceArea_DALATH903 ServiceArea_DALCOM903 ServiceArea_DALCRS903 ServiceArea_DALDAL214 ServiceArea_DALDEN903 ServiceArea_DALDTN940 ServiceArea_DALFTW817 ServiceArea_DALGRE903 ServiceArea_DALGVL940 ServiceArea_DALKAU469 ServiceArea_DALMNW940 ServiceArea_DALMVN903 ServiceArea_DALSHR903 ServiceArea_DALSLS903 ServiceArea_DALSTV254 ServiceArea_DENBOU303 ServiceArea_DENCOL719 ServiceArea_DENDEN303 ServiceArea_DENDIL970 ServiceArea_DENFTC970 ServiceArea_DENGLD303 ServiceArea_DENGRE970 ServiceArea_DENVAL970 ServiceArea_DETADR517 ServiceArea_DETANN734 ServiceArea_DETBAT616 ServiceArea_DETBNH616 ServiceArea_DETBWG419 ServiceArea_DETDET313 ServiceArea_DETFER248 ServiceArea_DETFLI810 ServiceArea_DETFRE419 ServiceArea_DETJAC517 ServiceArea_DETKAL616 ServiceArea_DETLAN517 ServiceArea_DETMON734 ServiceArea_DETNOR248 ServiceArea_DETPON248 ServiceArea_DETROS810 ServiceArea_DETSOU248 ServiceArea_DETTOL419 ServiceArea_DETTRO248 ServiceArea_DETWAS419 ServiceArea_DETWYN734 ServiceArea_FLNARC863 ServiceArea_FLNAVO863 ServiceArea_FLNBAR863 ServiceArea_FLNBEL352 ServiceArea_FLNBRD941 ServiceArea_FLNBSH352 ServiceArea_FLNCLR813 ServiceArea_FLNCOC407 ServiceArea_FLNCRY352 ServiceArea_FLNDAY904 ServiceArea_FLNEUS352 ServiceArea_FLNFRN904 ServiceArea_FLNGAN352 ServiceArea_FLNHNC863 ServiceArea_FLNINV352 ServiceArea_FLNJAC904 ServiceArea_FLNKEH352 ServiceArea_FLNKEN321 ServiceArea_FLNKIS407 ServiceArea_FLNLAK941 ServiceArea_FLNLEE352 ServiceArea_FLNLKC904 ServiceArea_FLNLKP863 ServiceArea_FLNLKW863 ServiceArea_FLNNPR813 ServiceArea_FLNOCA352 ServiceArea_FLNOGC904 ServiceArea_FLNORL407 ServiceArea_FLNPAL904 ServiceArea_FLNSAG904 ServiceArea_FLNSAN407 ServiceArea_FLNSAR941 ServiceArea_FLNSEB863 ServiceArea_FLNSMY904 ServiceArea_FLNSTK904 ServiceArea_FLNTAL850 ServiceArea_FLNTAM813 ServiceArea_FLNWAU863 ServiceArea_FLNWIL352 ServiceArea_FLNWNH941 ServiceArea_FLNWNP407 ServiceArea_FLNZEP813 ServiceArea_GCWBTR225 ServiceArea_GCWGUL228 ServiceArea_GCWLAF337 ServiceArea_HARBRI203 ServiceArea_HARHAR860 ServiceArea_HARLON860 ServiceArea_HARNEW203 ServiceArea_HARNOR203 ServiceArea_HARSPR413 ServiceArea_HARWAT203 ServiceArea_HOPNAN570 ServiceArea_HOUANG409 ServiceArea_HOUBMT409 ServiceArea_HOUBRN409 ServiceArea_HOUCON409 ServiceArea_HOUFRE409 ServiceArea_HOUGLV409 ServiceArea_HOUHOU281 ServiceArea_HOUHUN936 ServiceArea_HOULJK409 ServiceArea_HOUSPR832 ServiceArea_HOUVIC361 ServiceArea_HWIHON808 ServiceArea_HWIMAU808 ServiceArea_INDAND765 ServiceArea_INDCIC317 ServiceArea_INDCLO765 ServiceArea_INDCRA765 ServiceArea_INDFRA765 ServiceArea_INDIND317 ServiceArea_INDLAF765 ServiceArea_INDMUN765 ServiceArea_INHBLU419 ServiceArea_INHCEL419 ServiceArea_INHCRI419 ServiceArea_INHDFN419 ServiceArea_INHFIN419 ServiceArea_INHFTW219 ServiceArea_INHHIC419 ServiceArea_INHKOK765 ServiceArea_INHSBN219 ServiceArea_INHVNW419 ServiceArea_INUEVA812 ServiceArea_IPMGDR616 ServiceArea_IPMHOL616 ServiceArea_IPMMID517 ServiceArea_IPMSAG517 ServiceArea_KCYCLI660 ServiceArea_KCYELD316 ServiceArea_KCYHES316 ServiceArea_KCYHNV816 ServiceArea_KCYHUT316 ServiceArea_KCYKCK913 ServiceArea_KCYKCM816 ServiceArea_KCYLAW913 ServiceArea_KCYLEA913 ServiceArea_KCYNEW316 ServiceArea_KCYOTW785 ServiceArea_KCYTOP913 ServiceArea_KCYWAR660 ServiceArea_KCYWIC316 ServiceArea_LAUBRO601 ServiceArea_LAUCLM662 ServiceArea_LAUGNW662 ServiceArea_LAUHAT601 ServiceArea_LAUJAC601 ServiceArea_LAUNAT601 ServiceArea_LAUTUP662 ServiceArea_LAWBUM409 ServiceArea_LAXALA562 ServiceArea_LAXALB626 ServiceArea_LAXANA714 ServiceArea_LAXBEV310 ServiceArea_LAXBUR818 ServiceArea_LAXCAN661 ServiceArea_LAXCAS661 ServiceArea_LAXCDG310 ServiceArea_LAXCOR909 ServiceArea_LAXCOV626 ServiceArea_LAXCUL310 ServiceArea_LAXDOW562 ServiceArea_LAXIND760 ServiceArea_LAXING310 ServiceArea_LAXIRV949 ServiceArea_LAXLAG949 ServiceArea_LAXLAN661 ServiceArea_LAXLAX213 ServiceArea_LAXLAX323 ServiceArea_LAXMON323 ServiceArea_LAXOAK805 ServiceArea_LAXONT909 ServiceArea_LAXOXN805 ServiceArea_LAXPAS626 ServiceArea_LAXPER909 ServiceArea_LAXPSG760 ServiceArea_LAXRIV909 ServiceArea_LAXSAN714 ServiceArea_LAXSBN909 ServiceArea_LAXSFN818 ServiceArea_LAXSIM805 ServiceArea_LAXSJC949 ServiceArea_LAXSMN310 ServiceArea_LAXSNP310 ServiceArea_LAXVEN805 ServiceArea_LAXVIC760 ServiceArea_LAXVNY818 ServiceArea_LAXWES310 ServiceArea_LOUCOR812 ServiceArea_LOUETN502 ServiceArea_LOUFRK502 ServiceArea_LOULEX606 ServiceArea_LOULOU502 ServiceArea_LOUNAL812 ServiceArea_MIABEL561 ServiceArea_MIABON941 ServiceArea_MIADEL561 ServiceArea_MIADFD954 ServiceArea_MIAFTL954 ServiceArea_MIAFTM941 ServiceArea_MIAHWD954 ServiceArea_MIAJUP561 ServiceArea_MIAKEY305 ServiceArea_MIAMAR305 ServiceArea_MIAMIA305 ServiceArea_MIANAP941 ServiceArea_MIANDA305 ServiceArea_MIAOKE863 ServiceArea_MIAPOR941 ServiceArea_MIAPSL561 ServiceArea_MIASUG305 ServiceArea_MIAVER561 ServiceArea_MIAWPB561 ServiceArea_MILBEL608 ServiceArea_MILBUR262 ServiceArea_MILJAN608 ServiceArea_MILJNC920 ServiceArea_MILKEN414 ServiceArea_MILLAK262 ServiceArea_MILLKM920 ServiceArea_MILMAD608 ServiceArea_MILMIL414 ServiceArea_MILRAC414 ServiceArea_MILWAT920 ServiceArea_MILWAU262 ServiceArea_MINBLO952 ServiceArea_MINCOR763 ServiceArea_MINMIN612 ServiceArea_MINSTP612 ServiceArea_NCRALB704 ServiceArea_NCRASH336 ServiceArea_NCRCHA704 ServiceArea_NCRCHE757 ServiceArea_NCRCLA919 ServiceArea_NCRCON704 ServiceArea_NCRCRY919 ServiceArea_NCRDNN910 ServiceArea_NCRDUR919 ServiceArea_NCRFAY910 ServiceArea_NCRGRB757 ServiceArea_NCRGRE336 ServiceArea_NCRGST704 ServiceArea_NCRHAR704 ServiceArea_NCRHEN252 ServiceArea_NCRHGP336 ServiceArea_NCRKAN704 ServiceArea_NCRLUM910 ServiceArea_NCRMID704 ServiceArea_NCRMIL803 ServiceArea_NCRMON704 ServiceArea_NCRNWN757 ServiceArea_NCROXF919 ServiceArea_NCRPIT919 ServiceArea_NCRPOR757 ServiceArea_NCRPTR804 ServiceArea_NCRRAL919 ServiceArea_NCRRIC804 ServiceArea_NCRROC803 ServiceArea_NCRSAL704 ServiceArea_NCRSAN919 ServiceArea_NCRSHE704 ServiceArea_NCRSIC919 ServiceArea_NCRSMF757 ServiceArea_NCRSMI919 ServiceArea_NCRSPN910 ServiceArea_NCRVIR757 ServiceArea_NCRWAK919 ServiceArea_NCRWIN336 ServiceArea_NCRWLM757 ServiceArea_NCRYOR803 ServiceArea_NEVCHU619 ServiceArea_NEVCOR619 ServiceArea_NEVELC619 ServiceArea_NEVENC760 ServiceArea_NEVESC760 ServiceArea_NEVLAU702 ServiceArea_NEVLMS619 ServiceArea_NEVLVS702 ServiceArea_NEVNAT619 ServiceArea_NEVOCN760 ServiceArea_NEVPOW619 ServiceArea_NEVSDG619 ServiceArea_NMCGDJ970 ServiceArea_NMCPUE719 ServiceArea_NMXABI915 ServiceArea_NMXALB505 ServiceArea_NMXAMA806 ServiceArea_NMXDEL830 ServiceArea_NMXEAG830 ServiceArea_NMXELP915 ServiceArea_NMXFLA520 ServiceArea_NMXLAR956 ServiceArea_NMXLCR505 ServiceArea_NMXLSA505 ServiceArea_NMXLUB806 ServiceArea_NMXPLA806 ServiceArea_NMXPRE520 ServiceArea_NMXSAN505 ServiceArea_NMXSAN915 ServiceArea_NMXTER915 ServiceArea_NMXYUM520 ServiceArea_NNYALB518 ServiceArea_NNYBUF716 ServiceArea_NNYBUR914 ServiceArea_NNYPOU914 ServiceArea_NNYROC716 ServiceArea_NNYSYR315 ServiceArea_NNYUTI315 ServiceArea_NOLBOG504 ServiceArea_NOLKEN504 ServiceArea_NOLPIC601 ServiceArea_NOLPOH504 ServiceArea_NORALX320 ServiceArea_NORDUL218 ServiceArea_NORFAR701 ServiceArea_NORFER218 ServiceArea_NORFRM218 ServiceArea_NORMAN507 ServiceArea_NOROWT507 ServiceArea_NORRDW651 ServiceArea_NORROC507 ServiceArea_NORSTC320 ServiceArea_NORZIM763 ServiceArea_NSHCOL615 ServiceArea_NSHNSH615 ServiceArea_NSHSPR615 ServiceArea_NVUGAR775 ServiceArea_NVUREN775 ServiceArea_NYCBRO917 ServiceArea_NYCCIT914 ServiceArea_NYCETT732 ServiceArea_NYCFHD732 ServiceArea_NYCJER201 ServiceArea_NYCKPT732 ServiceArea_NYCMAN917 ServiceArea_NYCMTK914 ServiceArea_NYCNAS516 ServiceArea_NYCNEW201 ServiceArea_NYCNEW732 ServiceArea_NYCNEW908 ServiceArea_NYCNEW973 ServiceArea_NYCPAS973 ServiceArea_NYCPLA908 ServiceArea_NYCPLS609 ServiceArea_NYCQUE917 ServiceArea_NYCSUF516 ServiceArea_NYCTMR732 ServiceArea_NYCWHI914 ServiceArea_NYCWOO732 ServiceArea_OHHASH606 ServiceArea_OHHATH740 ServiceArea_OHHCAM740 ServiceArea_OHHCHA304 ServiceArea_OHHCHI740 ServiceArea_OHHCLA304 ServiceArea_OHHFAI304 ServiceArea_OHHGAL740 ServiceArea_OHHHUN304 ServiceArea_OHHJAC740 ServiceArea_OHHMOR304 ServiceArea_OHHMTA740 ServiceArea_OHHPAR304 ServiceArea_OHHPOR740 ServiceArea_OHHWAV740 ServiceArea_OHHZAN740 ServiceArea_OHIAKR330 ServiceArea_OHIALL330 ServiceArea_OHIASH419 ServiceArea_OHIAUR330 ServiceArea_OHIBCY419 ServiceArea_OHIBER440 ServiceArea_OHIBUT419 ServiceArea_OHICAN330 ServiceArea_OHICIN513 ServiceArea_OHICIR740 ServiceArea_OHICLB330 ServiceArea_OHICLE216 ServiceArea_OHICOL614 ServiceArea_OHICOV606 ServiceArea_OHIDAY937 ServiceArea_OHIDEL740 ServiceArea_OHIELY440 ServiceArea_OHIHAR330 ServiceArea_OHIKEN330 ServiceArea_OHILAN740 ServiceArea_OHILAW812 ServiceArea_OHILEB513 ServiceArea_OHILRN440 ServiceArea_OHIMAN419 ServiceArea_OHIMAR740 ServiceArea_OHIMED330 ServiceArea_OHIMID513 ServiceArea_OHIMRY937 ServiceArea_OHIMTV740 ServiceArea_OHINCA937 ServiceArea_OHINEW740 ServiceArea_OHINOR419 ServiceArea_OHIOBE440 ServiceArea_OHIOXF513 ServiceArea_OHIPIQ937 ServiceArea_OHIPSV440 ServiceArea_OHIRAV330 ServiceArea_OHISAN419 ServiceArea_OHISGF937 ServiceArea_OHITRO937 ServiceArea_OHITRT937 ServiceArea_OHIWAR330 ServiceArea_OHIWIL937 ServiceArea_OHIWOO330 ServiceArea_OHIXEN937 ServiceArea_OHIYNG330 ServiceArea_OKCARD580 ServiceArea_OKCBAR918 ServiceArea_OKCBEN501 ServiceArea_OKCBTN501 ServiceArea_OKCCAB501 ServiceArea_OKCCHC405 ServiceArea_OKCCON501 ServiceArea_OKCEMP316 ServiceArea_OKCEND580 ServiceArea_OKCFAY501 ServiceArea_OKCFTS501 ServiceArea_OKCJUN785 ServiceArea_OKCLAW580 ServiceArea_OKCLRK501 ServiceArea_OKCMAN785 ServiceArea_OKCMCA918 ServiceArea_OKCMCP316 ServiceArea_OKCMUS918 ServiceArea_OKCOKC405 ServiceArea_OKCSAL785 ServiceArea_OKCSTW405 ServiceArea_OKCTUL918 ServiceArea_OKCWIC940 ServiceArea_OMAAMS515 ServiceArea_OMACDR319 ServiceArea_OMADES515 ServiceArea_OMAGRN515 ServiceArea_OMAIWC319 ServiceArea_OMALNC402 ServiceArea_OMANEW515 ServiceArea_OMAOMA402 ServiceArea_PHIALL484 ServiceArea_PHIARD610 ServiceArea_PHIAVD610 ServiceArea_PHIBRI856 ServiceArea_PHICAM302 ServiceArea_PHICAP609 ServiceArea_PHICHC215 ServiceArea_PHICTR610 ServiceArea_PHIDOV302 ServiceArea_PHIELK443 ServiceArea_PHIGEO302 ServiceArea_PHIJEN215 ServiceArea_PHILAN717 ServiceArea_PHIMER609 ServiceArea_PHIMID302 ServiceArea_PHIMIL302 ServiceArea_PHIMIV856 ServiceArea_PHIMUL609 ServiceArea_PHIPHI215 ServiceArea_PHIPLS609 ServiceArea_PHIRDN484 ServiceArea_PHISAL856 ServiceArea_PHISCR717 ServiceArea_PHITRT609 ServiceArea_PHIVIN609 ServiceArea_PHIWIL302 ServiceArea_PHIWLW609 ServiceArea_PHXGLE623 ServiceArea_PHXPHX602 ServiceArea_PHXSCO480 ServiceArea_PHXTUC520 ServiceArea_PITBUT412 ServiceArea_PITCAR412 ServiceArea_PITCOR412 ServiceArea_PITFOR412 ServiceArea_PITGIB412 ServiceArea_PITGRE412 ServiceArea_PITHOM412 ServiceArea_PITIND724 ServiceArea_PITMNG412 ServiceArea_PITMON412 ServiceArea_PITNEW412 ServiceArea_PITROC412 ServiceArea_PITSOM814 ServiceArea_PITUNT412 ServiceArea_PITWAS412 ServiceArea_PITWEI304 ServiceArea_PITWHE304 ServiceArea_SANAUS512 ServiceArea_SANCOC254 ServiceArea_SANCRP512 ServiceArea_SANFRE830 ServiceArea_SANGEO512 ServiceArea_SANGIL830 ServiceArea_SANKER830 ServiceArea_SANKIL254 ServiceArea_SANLAM512 ServiceArea_SANMCA210 ServiceArea_SANREF361 ServiceArea_SANROM956 ServiceArea_SANSAN210 ServiceArea_SANSMC512 ServiceArea_SANTEM254 ServiceArea_SANWOO361 ServiceArea_SDABRK605 ServiceArea_SDASFL605 ServiceArea_SDAWTR605 ServiceArea_SEAABN253 ServiceArea_SEAALB541 ServiceArea_SEABEA503 ServiceArea_SEABLG360 ServiceArea_SEABLV425 ServiceArea_SEABOI208 ServiceArea_SEACDA208 ServiceArea_SEACHE360 ServiceArea_SEACOR541 ServiceArea_SEADAL503 ServiceArea_SEAEUG541 ServiceArea_SEAEVE425 ServiceArea_SEALEW208 ServiceArea_SEALVW360 ServiceArea_SEAMON503 ServiceArea_SEAMTV360 ServiceArea_SEAOKH360 ServiceArea_SEAOLY360 ServiceArea_SEAPOR503 ServiceArea_SEASAL503 ServiceArea_SEASEA206 ServiceArea_SEASIL360 ServiceArea_SEASPO509 ServiceArea_SEATAC253 ServiceArea_SEAVAN360 ServiceArea_SEWGTP541 ServiceArea_SEWHDR541 ServiceArea_SEWKEN509 ServiceArea_SEWKHF541 ServiceArea_SEWMED541 ServiceArea_SEWMLF541 ServiceArea_SEWPAS509 ServiceArea_SEWPEN541 ServiceArea_SEWPRO509 ServiceArea_SEWROS541 ServiceArea_SEWSUN509 ServiceArea_SEWWAL509 ServiceArea_SEWYAK509 ServiceArea_SFRCBL408 ServiceArea_SFRCON925 ServiceArea_SFRCRU831 ServiceArea_SFRDAN925 ServiceArea_SFRDSR925 ServiceArea_SFRFAI707 ServiceArea_SFRHAY510 ServiceArea_SFROAK510 ServiceArea_SFROAK925 ServiceArea_SFRPAL650 ServiceArea_SFRRCM510 ServiceArea_SFRROC916 ServiceArea_SFRSAC916 ServiceArea_SFRSCL408 ServiceArea_SFRSFR415 ServiceArea_SFRSFS650 ServiceArea_SFRSGS530 ServiceArea_SFRSMO650 ServiceArea_SFRSRO707 ServiceArea_SFRVAC707 ServiceArea_SFRWLC925 ServiceArea_SFRWOO530 ServiceArea_SFRWTV831 ServiceArea_SFURED530 ServiceArea_SFUSAC530 ServiceArea_SHECAR717 ServiceArea_SHECHA717 ServiceArea_SHEEDI540 ServiceArea_SHEFTR540 ServiceArea_SHEHAG301 ServiceArea_SHEHAR540 ServiceArea_SHEMAR304 ServiceArea_SHEMYE301 ServiceArea_SHEWIN540 ServiceArea_SHEYOR717 ServiceArea_SLCKAY801 ServiceArea_SLCOGD801 ServiceArea_SLCPRK435 ServiceArea_SLCPRO801 ServiceArea_SLCSLC801 ServiceArea_SLCTOO801 ServiceArea_SLUSTG435 ServiceArea_STLCHA636 ServiceArea_STLCHE636 ServiceArea_STLCMB573 ServiceArea_STLCOL618 ServiceArea_STLCPG573 ServiceArea_STLCRD618 ServiceArea_STLFUL573 ServiceArea_STLHNB573 ServiceArea_STLJEF573 ServiceArea_STLJOP417 ServiceArea_STLJOS816 ServiceArea_STLOZA573 ServiceArea_STLPIT316 ServiceArea_STLPRB573 ServiceArea_STLQUI217 ServiceArea_STLROL573 ServiceArea_STLSED660 ServiceArea_STLSPR417 ServiceArea_STLSTL314 ServiceArea_VAHCHL804 ServiceArea_VAHDAN804 ServiceArea_VAHLYN804 ServiceArea_VAHMTN540 ServiceArea_VAHRAD540 ServiceArea_VAHROA540 ServiceArea_VAHSTA540 ServiceArea_VAHWAY540 CreditRating_2-High CreditRating_3-Good CreditRating_4-Medium CreditRating_5-Low CreditRating_6-VeryLow CreditRating_7-Lowest PrizmCode_Rural PrizmCode_Suburban PrizmCode_Town Occupation_Crafts Occupation_Homemaker Occupation_Other Occupation_Professional Occupation_Retired Occupation_Self Occupation_Student MaritalStatus_No MaritalStatus_Yes Cluster Segment
0 3000002 Yes 24.00 219.0 22.0 0.25 0.0 0.0 -157.0 -19.0 0.7 0.7 6.3 0.0 0.0 0.0 0.0 58.0 24.0 0.0 0.3 61 2 1 2.0 361.0 62.0 0.0 0 0 1 0 0 1 1 1 0 0 1 1 1 0 0 0 0 4 0 0 30.0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 3 3
1 3000010 Yes 16.99 10.0 17.0 0.00 0.0 0.0 -4.0 0.0 0.3 0.0 2.7 0.0 0.0 0.0 0.0 5.0 1.0 0.0 0.0 58 1 1 2.0 1504.0 40.0 42.0 1 0 0 0 0 1 1 1 0 0 1 1 0 0 1 0 0 5 0 0 30.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 3 3
2 3000014 No 38.00 8.0 38.0 0.00 0.0 0.0 -2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.0 1.3 3.7 0.0 0.0 60 1 1 1.0 1812.0 26.0 26.0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 6 0 0 0.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 3 3
3 3000022 No 82.28 1312.0 75.0 1.24 0.0 0.0 157.0 8.1 52.0 7.7 76.0 4.3 1.3 370.3 147.0 555.7 303.7 0.0 22.7 59 2 2 9.0 458.0 30.0 0.0 0 0 1 0 0 1 1 1 0 0 0 1 0 0 1 0 0 6 0 0 10.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1
4 3000026 Yes 17.14 0.0 17.0 0.00 0.0 0.0 0.0 -0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 53 2 2 4.0 852.0 46.0 54.0 0 0 0 0 0 1 1 1 0 0 1 1 0 0 0 1 0 9 0 1 10.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 3 3
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51047 entries, 0 to 51046
Columns: 816 entries, CustomerID to Segment
dtypes: float64(24), int32(18), int64(9), object(1), uint8(764)
memory usage: 53.9+ MB
None

Customer Segmentation¶

To group customers into different segments, we use the K-means clustering algorithm. To find the optimal number of clusters, we use the Elbow method and the KneeLocator library. After identifying the optimal number of clusters, we perform clustering and add the cluster labels to the dataset.

In [6]:
def kmeans_inertia(k, scaled_data):
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(scaled_data)
    return kmeans.inertia_

# Perform customer segmentation
def customer_segmentation(df, max_k=10):
    X = df.drop(['CustomerID','Churn'], axis=1)
    #print(list(X.columns))
    X = StandardScaler().fit_transform(X)
    
    # Find the optimal number of clusters (K) using the Elbow method
    sum_of_squared_distances = []
    k_values = range(1, max_k+1)

    sum_of_squared_distances= (Parallel(n_jobs=-1)(delayed(kmeans_inertia)(k, X) for k in k_values))

    # Identify the elbow point programmatically using the KneeLocator
    kneedle = KneeLocator(k_values, sum_of_squared_distances, curve='convex', direction='decreasing')
    optimal_k = kneedle.knee
    print(f"Optimal K value: {optimal_k}")
    
    model = KMeans(n_clusters=optimal_k, random_state=42)
    labels = model.fit_predict(X)
    
    # Calculate the mean value of each feature for each cluster
    df['Cluster'] = labels
    feature_means_by_cluster = df.drop(['CustomerID','Churn'], axis=1).groupby('Cluster').mean()

     # Rank the features by their importance within each cluster
    top_features_by_cluster = feature_means_by_cluster.apply(lambda s: s.abs().nlargest(5).index.tolist(), axis=1)

    # Create a grouped bar chart of the top 5 features for each cluster with unique colors
    fig = go.Figure()

    colors = px.colors.qualitative.Plotly[:optimal_k]
    x_labels = []
    for cluster, top_features in top_features_by_cluster.iteritems():
        cluster_data = feature_means_by_cluster.loc[cluster, top_features]
        cluster_data_percentage = cluster_data / cluster_data.sum() * 100
        for feature, percentage in zip(top_features, cluster_data_percentage):
            fig.add_trace(go.Bar(
                x=[f"Cluster {cluster} - {feature}"],
                y=[percentage],
                name=f"Cluster {cluster} - {feature}",
                marker_color=colors[cluster]
            ))
            x_labels.append(f"Cluster {cluster} - {feature}")

    fig.update_layout(
        barmode='group',
        xaxis_title="Clusters and Top Features",
        yaxis_title="Percentage",
        title=f"Top 5 Features for Each Cluster",
        xaxis={'tickmode': 'array', 'tickvals': x_labels, 'ticktext': x_labels, 'tickangle': -45},
        showlegend=False
    )

    fig.show()

    return labels
In [11]:
%%time

# Run customer segmentation
labels = customer_segmentation(df)
df['Segment'] = labels
Optimal K value: 4
CPU times: total: 39.9 s
Wall time: 31.4 s

Model training and evaluation¶

We use the XGBoost classifier to train our model. XGBoost is an efficient and powerful algorithm that works well with imbalanced data. To find the best hyperparameters for the model, we perform a grid search with cross-validation.

We evaluate the model using various classification metrics, such as precision, recall, F1-score, and accuracy. These metrics help us understand the performance of the model and guide us in making improvements.

In [8]:
# Train churn prediction model
def train_churn_prediction_model(df):
    X = df.drop(['CustomerID', 'Churn'], axis=1)
    y = df['Churn'].map({'Yes': 1, 'No': 0})
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Resample the training data using SMOTE
    smote = SMOTE(random_state=42)
    X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train)
    
    # Create a pipeline for preprocessing, feature selection, and model training
    pipeline = Pipeline([
        ('scaler', StandardScaler()),
        ('selector', SelectKBest(k=10)),
        ('classifier', xgb.XGBClassifier(use_label_encoder=False, random_state=42, eval_metric='logloss'))
    ])
    
    # Define a parameter grid for hyperparameter tuning
    param_grid = {
        'classifier__n_estimators': [100, 200, 500],
        'classifier__learning_rate': [0.01, 0.1, 0.2],
        'classifier__max_depth': [3, 6, 9],
        'classifier__subsample': [0.5, 0.8, 1],
        'classifier__colsample_bytree': [0.5, 0.8, 1],
    }
    
    # Perform a grid search using cross-validation
    grid_search = GridSearchCV(pipeline, param_grid, scoring='recall', cv=5, n_jobs=-1, verbose=1)
    grid_search.fit(X_train_resampled, y_train_resampled)
    
    # Train the model with the best hyperparameters
    best_model = grid_search.best_estimator_
    best_model.fit(X_train_resampled, y_train_resampled)
    
    # Make predictions on the test set
    y_pred = best_model.predict(X_test)

    # create the confusion matrix
    cm = confusion_matrix(actual_labels, predicted_labels)

    # create the heatmap
    heatmap = go.Heatmap(z=cm, x=['0', '1'], y=['0', '1'], colorscale='Blues')

    # create the layout
    layout = go.Layout(title='Confusion Metrix')

    # create the figure
    fig = go.Figure(data=[heatmap], layout=layout)

    # show the figure
    fig.show()

    # Print the classification report
    print(classification_report(y_test, y_pred))
    print("Best parameters found: ", grid_search.best_params_)
    print("Accuracy:", accuracy_score(y_test, y_pred))
    print("F1 Score:", f1_score(y_test, y_pred, average='weighted'))
    print("Precision:", precision_score(y_test, y_pred, average='weighted'))
    print("Recall:", recall_score(y_test, y_pred, average='weighted'))
    
    return best_model
In [9]:
%%time

model = train_churn_prediction_model(df)
Fitting 5 folds for each of 243 candidates, totalling 1215 fits
              precision    recall  f1-score   support

           0       0.72      0.71      0.72      7308
           1       0.30      0.32      0.31      2902

    accuracy                           0.60     10210
   macro avg       0.51      0.51      0.51     10210
weighted avg       0.60      0.60      0.60     10210

Best parameters found:  {'classifier__colsample_bytree': 0.8, 'classifier__learning_rate': 0.2, 'classifier__max_depth': 9, 'classifier__n_estimators': 500, 'classifier__subsample': 1}
Accuracy: 0.597845249755142
F1 Score: 0.6011637045432573
Precision: 0.6047614016367154
Recall: 0.597845249755142
CPU times: total: 14min 32s
Wall time: 19min 14s

Conclusion¶

The churn prediction model developed in this project helps the telecom company to identify customers at risk of churn. By proactively addressing customer needs and concerns, the company can improve customer retention and reduce the negative impact of customer churn on its business. This project demonstrates the ability to apply machine learning techniques and data preprocessing methods to solve a real-world business problem effectively.